A taxonomy for similarity metrics between Markov decision processes

نویسندگان

چکیده

Abstract Although the notion of task similarity is potentially interesting in a wide range areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer based on idea reusing knowledge acquired set source tasks new process target task, assuming that and are close enough . In recent years, succeeded making reinforcement (RL) algorithms more efficient (e.g., by reducing number samples needed achieve (near-)optimal performance). RL core concept : whenever similar , transferred can be reused solve significantly improve performance. Therefore, selection good metrics measure these similarities critical aspect when building algorithms, especially this from simulation real world. literature, there many between MDPs, hence, definitions its complement distance have considered. paper, we propose categorization analyze proposed so far, taking into account categorization. We also follow taxonomy survey existing well suggesting future directions for construction metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bisimulation Metrics for Continuous Markov Decision Processes

In recent years, various metrics have been developed for measuring the behavioural similarity of states in probabilistic transition systems [Desharnais et al., Proceedings of CONCUR, (1999), pp. 258-273, van Breugel and Worrell, Proceedings of ICALP, (2001), pp. 421-432]. In the context of finite Markov decision processes, we have built on these metrics to provide a robust quantitative analogue...

متن کامل

Metrics for Finite Markov Decision Processes

Markov decision processes (MDPs) offer a popular mathematical tool for planning and learning in the presence of uncertainty (Boutilier, Dean, & Hanks 1999). MDPs are a standard formalism for describing multi-stage decision making in probabilistic environments. The objective of the decision making is to maximize a cumulative measure of longterm performance, called the return. Dynamic programming...

متن کامل

Computing Game Metrics on Markov Decision Processes

In this paper we study the complexity of computing the game bisimulation metric defined by de Alfaro et al. on Markov Decision Processes. It is proved by de Alfaro et al. that the undiscounted version of the metric is characterized by a quantitative game μ-calculus defined by de Alfaro and Majumdar, which can express reachability and ω-regular specifications. And by Chatterjee et al. that the d...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Metrics for Markov Decision Processes with Infinite State Spaces

We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning tas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06242-4